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Abstract. We consider the following problem: Given a matrix A, find minimal subsets 
of columns of A with cardinality no larger than a given bound that are linear dependent 
or nearly so. This problem arises in various forms in optimization, electrical engineering, 
and statistics. In its full generality, the problem is known to be NP-complete. We present 
a Monte Carlo method that finds such subsets with high confidence. We also give a 
deterministic method that is capable of proving that no subsets of linearly dependent 
columns up to a certain cardinality exist. The performance of both methods is analyzed 
and illustrated with numerical experiments. 



1. Introduction 

Let A be a real M x N matrix with rank m < N. In this paper, the following problem 
is studied: 

Find a minimal set of linearly dependent columns of A, 

that is, a set of linearly dependent columns of A such that each proper subset consists 
of linearly independent columns. An essentially equivalent problem is: 

Find a null vector x of A whose support {i\xi 7^ 0} is minimal. 

This problem is commonly called the sparse null vector problem, and an extension of 
the problem, known as the sparse null space problem, is: 

Find a basis of the null space of A whose vectors all have minimal support. 

The sparse null space problem occurs in optimization and in finite element analysis, 
where it is often of interest to express all solutions of an underdetermined system of 
equations Ax = b (a system of constraint equations or a discrete balance law) in the form 
x = xq + Cu, with a matrix C whose columns span the null space of A. Sparsity of C leads 
to well-known computational advantages. The sparse null space problem is discussed in 
detail in [5 J and [6j, and approximative algorithms for its solution are presented. Further 
work may be found in [2], [9], [16], [19]. The sparse null space problem can be solved with 
a canonical greedy algorithm that looks for a sequence of sparsest linearly independent 
null vectors. On the other hand, it is known that the sparse null vector problem is NP- 
complete in its full generality. Exceptions are also known. For example, a polynomial 
algorithm is known if A is the vertex-edge incidence matrix of a graph, see [1 lj . 

The problem of identifying minimal sets of linearly dependent columns of a matrix 
occurs also in electrical engineering. Here, the goal is to identify the behavior of the com- 
ponents of a circuit from measurements at a set of test points (input frequencies for an 
analog circuit, test words for a digital-to-analog converter, physical test nodes). Then it 
may happen that faults in a group of components are indistinguishable from one another. 
A simple example is a string of electric light bulbs: If one of the bulbs is defective, the 
entire string is dark, and it is not immediate which light has to be replaced. The connec- 
tion with linear algebra comes up as follows. Small deviations from the nominal circuit 
behavior can be described by a matrix equation Ax = b for the linearized response of a 
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circuit near a desired behavior. The vector b encodes deviations from nominal measure- 
ments at various test points, and the unknown x corresponds to deviations of parameters 
for circuit components from their desired status. If a subset of columns of A is linearly 
dependent, then the corresponding components of x are not unique, but satisfy some affine 
relation. Thus the deviations in the behavior of components in this group cannot be deter- 
mined uniquely, and faults cannot be located unambiguously. Such groups of components 
are called ambiguity groups in the engineering literature, and minimal groups are called 
canonical ambiguity groups. Their identification is useful to guide design modifications to 
improve the testability of a circuit. A discussion of this problem and of the numerical 
issues associated with it may be found in [18]. This paper also gives an algorithm (es- 
sentially a complete search) that can lead to the identification of all canonical ambiguity 
groups of moderate size. Further algorithmic approaches were presented in [8], |13j . and 

A common framework for this problem is provided by matroid theory; see e.g. [15] for 
an introduction. Here, subsets of elements of an abstract base set may be independent or 
dependent, and it is of interest to determine minimal dependent sets. Concrete examples 
for dependent sets are linearly dependent sets of vectors or vertices on a closed path in a 
graph. Minimal dependent sets are called circuits in this theory and correspond to closed 
paths with no repeated vertices in the graph theory context. The problem of enumerating 
all circuits up to a given size in a matroid is discussed in pQ. The structure of the set of 
all circuits up to a given size can be very complex. 

In recent work on compressed sensing or compressive sampling, it has been discovered 
that the sparsest solution x (in the sense of having the fewest number of non-zero com- 
ponents) of an underdetermined system of equations Aqx = b may often be recovered 
exactly or approximately by looking for a solution with minimal ^-norm; see e.g. [3J and 
[7]. An otherwise intractable problem can therefore be attacked with linear programming 
methods. It appears therefore to be promising to find minimal sets of linearly dependent 
columns of A by removing a single column (call this column b and call the remaining ma- 
trix Aq) and looking for a sparse solution of the system Aqx = b, using ^-minimization. 
A sparse solution of this problem immediately results in a sparse null vector. Repeating 
this for all columns of A would give all sparse null vectors. However, a typical assumption 
in these results, known as restricted isometry property, is that the condition numbers of 
all M x K submatrices of Aq have to be uniformly bounded, for some sufficiently large 
K. It is easy to see that in this case A cannot have many disjoint linearly dependent 
subsets of size < K. Indeed, for coefficient matrices with many disjoint subsets of linearly 
dependent columns, l l -minimization may fail to detect the sparsest solution of a system 
of linear equations. We illustrate this with the following example. 

Example 1.1. Let B,C be M x L matrices with rank L such that the column spaces 
have trivial intersection (hence 2L < M). For (3, 7 7^ 0, consider the matrix 

A = (B C 0B + jC). 

This matrix has many sets of minimal linearly dependent columns of size 3. Specifically, 
denoting by aj the i-th column of A, all sets of the form {aj, a^+i, aj + 2L}, 1 < i < L have 
this property. Let 0/j/£ be arbitrary, and set b = By. The sparsest solution of the 
equation Ax = b is x = (y T , 0, 0) T . However, if M + |1 + 7I < 1, then the ^-minimal 
solution of this problem is x = (0, —^y T , —(1 + j)y T ) T which has twice as many non- 
vanishing entries as x if 7 7^ —1. That is, if —1 < 7 < and \/3\ > 1 or if —2 < 7 < — 1 
and \P\ > iTT-, the ^-minimal solution is not the sparsest solution. 
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In a sense, this paper is concerned with methods for examining the residual class of ma- 
trices for which ^-minimal solutions of overdetermined linear systems are not necessarily 
sparsest. 

We finally mention some problems in statistics that are related to the topic of this paper. 
Consider a linear regression problem Ax ~ 6, where the columns of A (the predictors) are 
now assumed to be linearly independent. The least squares solution is x = (A T A)- L A T b. 
If A has small singular values, then the estimate x will depend sensitively on small changes 
in the data b and may in fact be nonsensical. Often this instability occurs because a group 
of columns of A is nearly linearly dependent, a phenomenon known as multicollinearity 
(see e.g. p2J). This happens e.g. when similar measures of the same phenomenon are 
included in the set of predictors. While multicollinearity may be addressed by judicious 
choices of the variables that are included in a regression model, the detection of a subset 
of closely related variables in a data set is often interesting in its own right. A similar 
problem is the selection of a small set of variables in a regression model that may be used 
to predict a response. To place this in the present setup, consider the augmented matrix 
A = (A b). One would like to find a minimal set of nearly linearly dependent columns 
of A that contains b. This specific model selection problem is attacked with machine 
learning techniques in [14j . More broadly, the problem of detecting sets of components 
that are nearly linearly related in a high-dimensional data set belongs in the area known 
as association mining, see e.g. [4]. 

In this paper, two algorithms are discussed that may be used to detect or rule out the 
existence of small minimal subsets of columns of an M x A matrix A of rank m that are 
exactly or nearly linearly dependent. Suppose a single such subset of size n is present. 
The first method given here, a random search, is expected to detect it with probability 
1 — e by examining about | loge • p~ n \ submatrices, where p = The sub matrices are 
expected to have about m(l — p) rows and columns each. The second method, a systematic 
search, is capable of ruling out the presence of such a subset by examining about {^) 
submatrices of similar size, although recursive calls of the search routine may increase the 
computational effort. In either case, matrices with small relative rank defects, i.e. p close 
to 1, offer the best chances to detect or rule out the presence of small sets of minimal 
linearly dependent columns. We also give a modification of the random search method for 
the problem of finding minimal subsets of columns that are nearly linearly dependent. 

The paper is organized as follows. Section 2 contains definitions and basic facts. In 
section 3, two versions of the random search method for exactly dependent subsets are 
introduced. The systematic search is presented in section 4. Section 5 contains the mod- 
ification of the random search method for the case of nearly dependent sets of columns. 
In section 6, results from numerical experiments are presented. In section 7, we briefly 
discuss some related problems. 

The author would like to thank G. Stenbakken and T. Souders for introducing him to 
the problem and for many stimulating discussion. 

2. Notation and Auxiliary Results 

Given two nonnegative integers a < b, we denote the set {a, a + 1, . . . , 6} by a, b and 
identify it with the vector (a, a+1, . . . , b). Given A > 1, let I = ii-, ■ ■ ■ , in} C 1, A with 
i\ < ii < We identify X with the vector . . . , i n ) and write i\ = X(l), «2 = 2^(2), 

Let A be a real M x A matrix. Using Matlab notation, for a non-empty set Ic 1, A, 
we write A(:,Z) for the M x \Z\ matrix obtained by extracting all columns with indices in 
I, with the ordering of the rows remaining the same. Similarly, for a set ^ J C 1,M, 
A(J, :) is the x A matrix obtained by extracting all rows whose indices are in J . 
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Then A{J ,1) is the submatrix obtained from A by extracting the rows with indices in J 
and columns with indices in X. If y is a row or column vector, y(X) is the row or column 
vector with the components indexed by X extracted from those of y. The support of the 
vector y is defined as supply) = {i \ yi / 0}. We denote the vn x in identity matrix by H^. 

Assume now that A has exact rank m < min(M, N). One can then write A = LQ 
QlOj), where L is M x m lower triangular and Q is m x N with orthonormal rows. In 
particular, L has a left inverse (L T L)~~ 1 L T , and Q has full rank. Moreover, the null space 
of A is spanned by the columns of an N x (N — m) matrix U with orthonormal columns. 

We are interested in minimal linearly dependent sets of columns of A, that is, subsets 
3 C 1,N such that does not have full rank, but any matrix A(:,J"') with J' C 

3 ', J' ' 7^ J has full rank. Borrowing the corresponding term from matroid theory, such a 
set J or the set of columns indexed by it will be called a circuit in this paper. Recall that 
in the engineering literature on testability, a circuit is called a canonical ambiguity group; 
cf [13\ I18j. If J is a circuit, there exists a N x 1 null vector z of A (or equivalently of Q) 
such that supp(z) = J . Reversely, there exists a circuit J C supp(z) for any null vector 
z. Thus z is a sparse null vector for ^4 if supp(z) is a circuit, and z is also a sparse column 
vector for J7 in this case. Circuits containing only one column clearly must be columns of 
zeroes in A. 

A matrix with linearly independent columns does not have any circuits. More generally, 
a column A(:,j) does not belong to any circuit of A, if this column does not belong to 
the space spanned by the columns A(:,i), i / j or equivalently if the rank of the matrix 
drops if column j is removed. An explicit way of identifying columns that do not belong 
to circuits is given below in Lemma 12.51 We now characterize circuits of A. 

Lemma 2.1. Let A = LQ where Q has orthonormal rows, and let U have orthonormal 
columns such that QU = and the columns ofU span the null space of A. Let J C 1, N, 
and let J c be the complement of J in 1,N. 

(i) Then A(:,J) has a non-trivial null vector if and only ifU{J c ,:) has a non-trivial 
null vector. 

(ii) The following properties are equivalent: 

a) J is a circuit for A. 

b) Q(:,i7) has a one- dimensional null space spanned by a \ J\ - vector w that does not 
vanish anywhere. 

c) U{J C , :) has a non-zero null vector, and for all k £ J and IC = J' C U {k}, the matrix 
U(IC, :) has full rank. 

d) There are K C 1,N with J C K, and an N -vector y with supp(y) = J such that 
Q(:,1C) has a one- dimensional null space generated by y(IC). 

e) There is X C l,iV with XV\J = § such that U(X, :) has a one- dimensional null space 
generated by a vector d, and supp(Ud) = J . 

Proof. To prove (i), observe that any null vector v of A is of the form v = Ud, and the 
map v i— > d is an isomorphism between the null space of A and R N ~ m . If supp{v) C J 
and d^O, then d ^ and v(J c ) = U(J C , :)d = 0. Reversely, if U(J C , :)w = for some 
w 7^ 0, then supp(U w) C J and Uw is a non-trivial null vector of A. 

We now turn to (ii). First note that A(:,/C) and Q(:,JC) have the same null space for 
any /C C 1, N. Let us prove that a) implies b). If a) holds, the null space of Q(:, J) is non- 
trivial. All non-zero null vectors of Q{: J) must have support equal to J , since otherwise 
J would not be minimal. If the null space of Q(:, J) had dimension larger than 1, a linear 
combination of two null vectors could be found that vanishes at an index in J but not 
everywhere. Thus the null space a) of Q{:,J) must be one-dimensional, and b) follows. 
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Reversely, if b) holds, then there exists a nontrivial null vector v of A(:,J'). A non-trivial 
null vector of A{:,J) with strictly smaller support would be linearly independent of v, 
which is not allowed if b) holds. Hence J is a circuit of A. 

To prove the equivalence of a) and c), let J be a circuit of A. Then U(J C , :) has a 
non-trivial null vector by part (i). Now assume that for tC as in the assumption, UQC, :) 
does not have full rank. By (i), A(:,/C c does not have full rank and K. c C J, fC c / J, 
contradicting the minimality of J . Reversely, assume c). By (i), J contains a circuit 
of A. If J were not minimal, we could find J' = J — {k} such that A(:,J'') does not 
have full rank. But then with fC = J' c = J U {&;}, U(IC, :) will not have full rank by (i), 
contradicting c). 

Clearly b) implies d) - just take K, = J . For the reverse conclusion, just take w = y{J)- 
Then w does not vanish anywhere and spans the null space of Q{:,J). 

To prove that a) - d) together imply e), choose X = J c . By (i), there is a null vector d of 
U(J C , :). Then clearly supp(Ud) C J for all such null vectors. If the inclusion were strict, 
e.g. (Ud)^ = with k £ J, then U(J C U {k}, :) would not have full rank, contradicting 
c). Also, if U{J C , :) had two linearly independent null vectors d and d' , then the support 
of a suitable linear combination of Ud and Ud' would be strictly contained in J\ again 
contradicting c). Reversely, if e) holds, then U(J C , :)d = and hence J contains a circuit 
of A by (i). If J were not a circuit of A, e.g. if we could find a circuit J' C J , J' ^ J of 
A, we could find a null vector w = Ud' of A with supp(w) = J' and U(J' C , :)d' = 0, hence 
also U(I, :)d' = 0. Now w = Ud' and x = Ud are linearly independent, since they do not 
have the same support. Hence also d and d' are linearly independent, contradicting the 
assumption that U(I,:) has a one-dimensional null space. Therefore J is a circuit of A. 
So e) implies a). This completes the proof. □ 

With these preparations, a prototype algorithm for detecting circuits can be described. 
It operates on the matrix Q. 

Algorithm 2.2. Let A = LQ as above, where Q is m x N and has rank m. 

1. Choose a subset K, C 1,N. 2. Determine a matrix Y whose columns are a basis of 
the null space of Q(:,IC). 

3. If Y has rank 1, then J = suppiY) is a circuit of A. 

4. If Y is the empty matrix or has more than one column, choose a different subset /C 
and repeat the procedure. 

The algorithm stops with a circuit due to Lemma 2.1 d). The questions then arise how 
to choose /C, what the size of this subset should be, whether one can do better if Y has 
more than one column, how to conclude that there (probably) is no circuit of a given size, 
and so on. 

There is also a version of this prototype algorithm that operates on the matrix U. It 
was proposed in [13J. This algorithm stops after finding a circuit, due to Lemma 2.1 e). 
The same questions about the choice of I and alternate stopping criteria arise. 

Algorithm 2.3. Let U be a N x (N — m) matrix whose columns span the null space of 
A. 

1. Choose a subset X C 1, N. 

2. Determine a matrix Z whose columns are a basis of the null space of U(I, :). 

3. If Z has rank 1 then J = supp{UZ) is a circuit of A. 

4. If Z is the empty matrix or has more than one column, choose a different subset I 
and repeat the procedure. 



6 



HANS ENGLER 



Let us now assume that the last m columns of Q form an invertible matrix. This 
is always possible after permuting columns. Thus Q = {Q\, Q2) = Q2 (Q* ,Im)j where 
Q* = Q2 1 Qi is m x (N — m), and therefore 

(1) A = LQ = L(Q*,I m ) 

with L = LQ2. This decomposition was exploited in [17] to find circuits. Partitioning 
U = ( T j ), where U\ is (N — in) x (N — m) and U2 is m x (N — m), we see that 



= QU = Q2 (Q*U\ + C/2). Then f/j must be invertible, since otherwise U could not have 
full rank. Write 

(2) U* = U 2 UT\ C = i 1 ^™) ,U = CUx, 



then it follows that 



Q* + U* = and AC = 0. 

The matrix C (or rather the set of its columns) is commonly called a fundamental null 
basis, see e.g. [5]. It is now easy to give a version of Lemma 2.1 that uses only properties 
of Q* (or U*), i.e. that refers only to the fundamental null basis C. For any K, C 1,N, set 
)Ci = K(~\1,N — m, IC2 = {i 6 l,m\i+N — m G /C}, and /C2, c = G l,m\i+N — m ^ /C}. 



Lemma 2.4. Zei J7" C l 3 iV, and assume that the factorization |7]j holds. The following 
properties are equivalent: 

a) J is a circuit for A. 

b) Q*{J2,ci-J\) has a one- dimensional null space spanned by a vector w that does not 
vanish anywhere, and Q*{J2-,Ji)w does not vanish anywhere. 

c) There is an index set 1C with $ C K C 1,N such that Q* (^2,0^1) ^ as a one ~ 
dimensional null space generated by a vector w, with 

supp(w) = {i\JCx(i) £ Ji}, supp{Q* (K,2,K,i)w) = {i|/C 2 (i) G J2} ■ 

Proof. Let Q = (Q*,I m ). Let y be an iV-vector, then Q{:,J)y{J) = if and only if 
Q*(J2,c,Ji)y(Ji) = and Q*(J 2 ,Ji)y(Ji) = -y(J 2 )- By Lemma 2.1, statements a) and 
b) are therefore equivalent. 

To show that a) and c) are equivalent, one shows similarly that property d) in Lemma 
2.1 reduces to property c) in the present situation. □ 

Columns of A that do not belong to any circuit can be easily identified if the factorization 
([1]) or equivalently a fundamental null basis C = are given. 

Lemma 2.5. Let A be given such that the factorization holds with a full rank left 
factor L. Column j of A belongs to a circuit of A if and only if either 1 < j < N — m or 
if N — m + 1 < j < N and row j — N + m of Q* does not vanish identically. 

Proof. Note that row j — N + m of the right factor (Q*,I m ) contains a 1 in column j. 
Let e*; denote the k-th standard unit vector. If 1 < j < N — m, then column j of L~ 1 A 
satisfies 



L- 1 A(:,j) = Q*(:,j)= £ Q*(v,j)e v = £ Q*(u,j)A(:,N - m + u) 
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Thus this column is a unique linear combination with non-vanishing coefficients of some 
of the last m columns and therefore is in a circuit. UN — m + 1 < j < N and Q*(j — N + 
m, k) ^ for some k, then again 

L- L A(:,k) =Q*(:,k) = Yl Q* Yl Q*(v,k)A(:,N - m + u) . 

The term corresponding to v = j — N + m cannot be dropped, hence column j belongs to 
a circuit of A. 

Reversely, if iV — m + 1 < j < N and if Q*(j — N + m, :) = 0, then clearly the rank of 
(Q*,I m ) drops if column j is removed, since the remaining matrix will now have a row of 
zeroes. Hence column j does not belong to a circuit of A. □ 

3. Random Search for Circuits 

In this section, we give an Monte Carlo algorithm for finding circuits up to a certain 
size, say n. If there is no such circuit, the algorithm always states this correctly. If there 
is such a circuit, it is found with probability 1 — e in K steps, where K « | log e| (^) and 
in each step typically a null vector of a submatrix of Q or U must be found. 

Algorithm 3.1. Let A = LQ be as above. Let n (the desired circuit size) and e > be 
given. 

0. Set p = 1. 
While p > e, 

1. Choose a random subset /C C 1, N with \)C\ = m+1, uniformly from all these subsets, 
independently from previous selections. 

2. Determine a full rank matrix Z whose columns span the null space of Q(:,IC). 

3. a) If Z = w has rank 1, then set 

J = G supp(w)} . 

The set J" is a circuit of A. If \ J\ < n, STOP and return J . Otherwise set r = |/C|, 
replace p with 

and go to step 1. 

3.b) If rank(Z) = I > 1, replace K. with a random subset K. C /C such that |/C| = 
\K,\ — I + 1 , selected uniformly from all such subsets of /C, and go to step 2. 

Since the matrix Q(:,JC) is m x (m + 1), step 2 always finds a non-trivial matrix Z in 
the first attempt. Should rank(Z) = I be larger than 1, and step 3.c be carried out with 
a smaller /C, then the new matrix Q(:,/C) still has a non-trivial null space, but strictly 
smaller dimensions. Therefore, the method eventually ends up in step 3. a, i.e. it finds 
a subset /C of size r such that Q(:,/C) has a one-dimensional null space. It then inspects 
this subset to determine if it contains a circuit of the desired size. A single pass of the 
algorithm that ends in step 3. a) will be called at trial. It may involve several computations 
of null space bases Z. 

A given circuit J of size n will be detected in step 3. a if J C /C. This happens with 

(N-n\ IN-n\ 

probability ,n\ and fails to happen with probability 1 — ^? , where r = |/C|. At any 

V r ) \r ) 

stage of the algorithm, the value p therefore is the probability that a fixed circuit J of 
size n would not have been detected up to this stage. If this probability is very small (less 
than e), then we may be confident (with a confidence level of 1 — e) that no such circuit 
exists; hence the termination criterion. 



8 



HANS ENGLER 



Let p = and 5 = ^jp- If step 3. a is reached immediately (the generic case), then 
|/C| = m + 1, and the probability that a given circuit J of size n is detected equals 

f3l (m+i- J _-rr ^-ra + j + l > / m + 1 m-n + 2 \ w/2 _ / p(p-(5) \ n/2 
lJ C+i) M + i "V # 'N-n + lJ -\l-Sj 

For small <5, this is approximately equal to p n . Therefore, the probability of not detecting 
a fixed circuit of size n with K independent choices of K, is approximately bounded by 
(1 — p n ) K . This is certainly smaller than a given e if K > ~^° g£ . If there is indeed a circuit 
J of size n, the expected number of trials until it is found is approximately p~ n . Consider 
in particular instances of the problem where p = ^j^- > po, where po > is given, and 
n is fixed. As N, m — ► oo, the probability that a given circuit of size n is detected in a 
single trial is bounded below asymptotically by Pq n . Therefore, the expected number of 
steps to find a circuit of size n is exponential in n, but does not depend directly on the 
problem size Nm. One can expect that circuits of moderate size n are found rapidly if 
the rank defect N — m is small relative to N. 

If there is more than one circuit of size < n, it will take fewer trials to find one of them. 
Suppose there are k circuits of size Cj, j = 1, . . . , k, and assume that the circuits are all 
disjoint and their sizes are small compared to N. The probability of selecting a set K, of 
size m + 1 that contains a specific circuit of size Cj then is pj w p Cj . The probability of 
selecting a K, that contains one of these circuits is approximately 

(4) i_JJ(i_^). 

3=1 

The reciprocal of this number is (close to) the expected number of trials until a circuit is 
found. 

We now give a version of this algorithm that operates on the matrix Q*, where A = 
Li (Q* , I m ) , or equivalently on the fundamental null matrix C. As was noted in the previous 
section, this may require permuting the columns of A. 

Algorithm 3.2. Let A = Li(Q*,I m ) be as above. Let n (the desired circuit size) and 
e > be given. 

0. Set p = 1. 
While p > e, 

1. Choose a random subset /C C 1,N with \IC\ = m + 1, uniformly from all these 
subsets, independently from previous selections. Set K,\ = K. PI 1, N — m, /C2 = {i £ 
l,m\i + N - m G /C}, and /C 2 , c = {iG l,m\i + N - m ^ K.} . 

2. Determine a full rank matrix Z whose columns span the null space of Q*(K.2, c ,^i)- 

3. a) If Z = w has rank 1, then set 

J = \K\(i)\i G supp(w)} U {£.2(1) + N — m\i G supp (Q*(/C2, )Ci)w)}. 

The set J" is a circuit of A. If |J7"| < n, STOP and return J. Otherwise set r = |/C|, 
replace p with 

and go to step 1. 

3.b) If rank(Z) = I > 1, replace /C with a random subset K, d K. such that |/C| = 
|/C| — / + 1 , selected uniformly from all such subsets of /C, and go to step 2. 
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After choosing K. in step 1, the matrix Q* (K,2, c ,fci) is k x (A; + 1), where k = \K.2,c\ < 
min(m, A — m), and therefore step 2 always finds a non-trivial matrix Z in the first 
attempt. Should rank(Z) = I be larger than 1, and step 3.b be carried out, then the new 
matrix Q* (^2,0^1) ^ s obtained from the previous one by deleting s columns and adding 
I — 1 — s rows, where < r < I — 1. As a result, this matrix still has a non-trivial null 
space, but strictly smaller dimensions. Therefore, the method eventually ends up in step 
3. a, i.e. it finds a subset JC of size r such that Q*(K<2^ has a one-dimensional null 
space. It then inspects this subset to determine if it contains a circuit of the desired size. 

If K. is selected randomly such that \JC\ = m + 1, then Q*(K-2,cifci) 1S k X (k + 1) 
with k < m, k < N — m, and k has a hypergeometric distribution with expected value 
( m + 1 )(^- m ) _ ( m _|_ 1) (l — k, Np(l — p). The expected computational effort to find 
a basis of the null space of such a matrix is proportional to A 3 p 3 (l — p) 3 , to leading 
order (pH]). This implies that considerable savings are achieved by precomputing A = 
Li(Q* ,I m ) if the rank m or the rank defect A — m are small relative to A, since then 
p<lorl-p<Cl. 

4. Excluding Circuits of a Certain Size 

In this section, a deterministic algorithm will be given that allows one to conclude 
with certainty that a matrix A does not have a circuit of size n. This is a derandomized 
version of the Monte Carlo algorithm of the previous section. It is based on the following 
observation. 



Lemma 4.1. Let A be an M x N matrix of rank m. Let J\ , . . . , J r C 1, N be disjoint and 
non-empty such that \Jj Jj = 1, N . For C C 1, r, set J{C) = Uj £ e Assume that A has 
a circuit of size n. Then there exists C with \C\ = n such that A(:,J'(C)) has a circuit of 
size n. 



Proof. If A has a circuit Z of size n, we can set C = {j \Jj C\I 7^ 0}. This set has at 
most n elements, and 2 C J{C). Enlarging C if necessary, we obtain a set C C 1, r with n 
elements such that X C <J(C). A vector from the null space of A that is supported on I 
may be restricted to J{C), resulting in a vector in the null space of A(:,J'(C)) with the 
same support. Hence also A(:,J'(C)) has a circuit of size n. □ 

This observation is the basis of a recursive algorithm to determine if a given matrix A 
has a circuit of size n or smaller. The algorithm returns the value true if there is a circuit 
of size at most n and false otherwise. 

Algorithm 4.2. Let A be an M x N with rank(A) = m < min(M, N). Compute a logical 
variable a = circuit find(A, n) as follows. 

0. Compute A = LQ such that Q is m x N. Find the smallest integer r such n\^-~\ < 
m + 1 and set k = [^J • Find disjoints subsets Jx,...,J r C 1, A, all of size k or k + 1. 
Set C = 1, n C 1, r and set a = false. 

1. While C 7^ r — n + 1, r and a = false, do the following: 

l.a) Set c7(C) = Ujgc ^7? an d fi n d the dimension d of the null space of Q(:, J{C)). 

If d = 0, replace C with the next n-element subset of l,r, in lexicographical order and 
return to 1. 

If d = 1, find a non-trivial null vector z of Q{:,J{C)). If supp(z) has no more than n 
elements, STOP and return a = true. Otherwise replace C with the next n-element subset 
of l,r, in lexicographical order and return to 1. 
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If d > 1, set q = circuit/ ind(Q(:, J{C)),n), computed with this algorithm. If a = true, 
STOP. Otherwise replace C with the next n-element subset of 1, r, in lexicographical order 
and return to 1. 

Since m + 1 < N, the integer r computed in step is never smaller than n. In fact 
n = r is only possible if N = m + 1. In this case, the only possible circuit of A is the 
common support of the vectors in the one-dimensional null space of A. Therefore, one 
obtains r > n except in this trivial case. Then there are ( r ) possible matrices Q(:,J'(C)) 
which may be examined in lexicographic order, starting with C = {1, . . . ,n} and ending 
with C = {r — n + 1, . . . , r}. If A has a circuit of size n or smaller, one of these matrices 
must also have such a circuit, by Lemma 14.11 All these matrices have m rows and at 
most m + 1 columns, thus the dimension d of their null spaces may range from d = 
to d = m. If d = 0, the matrix Q(:,J{C)) does not have a circuit, and the algorithm 
examines the next subset J{C). If d = 1, the single possible circuit of this matrix consists 
of the common support of the vectors in this null space, by Lemma 12.11 The algorithm 
examines this circuit and stops with a value a = true if the circuit has size at most n. 
Finally, if d > 1, there may still be a circuit of size n or smaller in Q(: } J{C)), and it can 
be found by applying the same algorithm. This matrix has strictly fewer columns than Q, 
and its rank is also strictly smaller than the rank of Q. Hence the recursion will terminate 
after finitely many calls. 

Using as before the notation p = we see that r ~ np^ 1 . For the class of instances 
where ^ remains bounded away from 0, the number of matrices that has to be examined to 
exclude the presence of circuits up to size n is exponential in n, but it is independent of the 
problem size Nm, assuming of course that the algorithm does not call itself. Specifically, 
consider instances where > Po an d n < for some integer s and some po- Then 

< m + 1, and we may take r = n + s<(l — po)~ 1 s. In this case, the number of 
matrices that has to be examined is approximately bounded by 

( n+ n s )< ( (1 " p ; rls ) ~ (2wr 1/2 a°, a = (i - Por - v° > 1 

by Stirling's formula. Recall that for this class of instances, the expected number of matrix 
examinations to find a circuit of length n with the randomized algorithm of the previous 
section is also bounded independently of Nm and exponential in n. 

As before, it is possible to compute the factorization A = L\(Q*,l m ) and achieve 
additional computational savings. We leave the details to the reader. 

5. Inexact Circuits 

We now discuss situations in which the M x N matrix A has full rank and we wish to 
find a small subset of columns for which a linear combination vanishes approximately. Let 
e > and x be an iV-vector with ||x|| = 1, then we say that X C 1, N is an e-near circuit 
with witness vector x if supp(x) = I, \\Ax\\ < e, and all singular values of A(:,Ii) are 
larger than e whenever I\ C X, X\ ^ X. We are interested in circuits with " few" elements 
(relative to the number of columns N). Thus if a candidate for an e-near circuit with 
corresponding witness vector has been proposed, it is an easy matter to verify this. In 
particular, finding all singular values of A(:,X) is a matter of 0(M\X\ 2 ) operations, and it 
is possible to find all singular values of all matrices A(:,X\) with X\ = X — {A;} for some k 
in 0(M|T| 3 ) operations or faster, if suitable downdating methods are used. 
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Lemma 5.1. Let e > and let A be an M x N matrix with singular values < o\ < a 2 < 
■ ■ ■ < &max and singular value decomposition A = USV T . Let V = (v, V2), where v is the 
first column ofV, corresponding to o\. LetZ C 1, N . 

a) IfZ is an e-near circuit of A with witness vector x and e < 02, then 



€ 



b)If 



then I is an e' -near circuit with 



e><^Jo-j(l-5i) + o-l ax 5i. 



Proof. To prove part a), set x\ = vv T x and X2 = x — x\. Then clearly V 2 T x\ = and 
1 = \\xi\\ 2 + ||^2|| 2 - Also, 

e 2 > \\Axf = WAx.f + \\Ax 2 \\ 2 > \\Ax 2 \\ 2 > a|||x 2 || 2 . 

Consequently 

2 2 

^ > \\ x ^\\ 2 ^J2 x lj = J2 x h and 11^1 11 2 > 1 — 4s - 

a2 m m a2 

Therefore v = HxiH^ 1 ^ has the desired properties. 

To prove part b), let Zj = Vj if j £ I and Zj = otherwise, and set x = \\z\\~ 1 z = (1 — 
5 2 )~ l z. The cosine of the angle enclosed by v and z is \J\ — 8 2 , and hence x = y/l — 5 2 v+8y 
where ||y|| = 1 and (Ay) T Av = 0; therefore ||-Ay|| < a max . Hence 

\\Ax\\ 2 = (1 - 5 2 )\\Av\\ 2 + 8 2 \\Ay\\ 2 < a 2 (l - 5 2 ) + a 2 ma J 2 . 

□ 



This observation suggests that one should look for e-near circuits by selecting subsets 
t C 1,JV such that A(:,IC) has only one singular value o\ < e. A near circuit with a 
witness vector may then be discovered by setting all small components of the corresponding 
singular vector v equal to zero. This is the idea of the following algorithm. It either produces 
a candidate set I for an e-near circuit, or it returns the answer that no such near circuit 
exists. 

Algorithm 5.2. Let A be as above, with M < N. Let the desired circuit size n, 5 G (0, 1), 
and e > be given. 

0. Set p = 1. Determine m such that e separates the m largest singular values of A 
from the M — m smallest singular values. 

While p > 5, 

1. Choose a random subset K, C 1, N with |/C| = m+ 1, uniformly from all these subsets, 
independently from previous selections. 

2. Find the singular value decomposition USV T = A(:,JC), with singular values o\ < 
02 < 

3. a) If o\ < e < a 2 , consider v, the first column of V. Set J equal to the support of the 
n largest entries of v. Compute the smallest singular value of A(:, ZC( J)). If this value is 
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less than e, STOP and return X = 1C{J). Otherwise replace p with 

A (|2|-n)\ 



\ K\K\J / 

and go to step 1. 

3.b) If o\ < 02 < • • • < 07 < e < 07+1, replace K. with a random subset K, C K, with 
/ — 1 fewer elements, selected uniformly from all such subsets of JC, and go to step 2. 

Since the matrix A(:,IC) has m + 1 columns, at least one of its singular values is less or 
equal than e, by well-known interlacing properties of singular values (pH]). Hence either 
case 3. a) or case 3.b) always occurs. Should there be more than one singular value that is 
less or equal than e, then /C is replaced with a smaller /C, and the smaller matrix A(:,fC) 
still has at least one singular value less or equal than e. Therefore, the method eventually 
ends up in step 3. a. By Lemma 15.14 the singular vector corresponding to this singular 
value should suggest a near circuit. The method therefore inspects this singular vector to 
determine a candidate for a circuit of the desired size. 

As before, if no near circuit of size up to n is found before p < 5, we may be confident 
with confidence 1 — 8 that no near circuit of the desired size exists, that is, the search 
has been performed sufficiently often such that a near circuit with the desired properties 
would have been found with probability at least 1 — S, if it existed. The number of trials 
until a near circuit is found has the same distribution as the corresponding quantity in 
algorithm 13.11 If a candidate I has been found, it should still be tested; that is, the 
smallest singular value cril) of A(:,I) should be found as well as the minimal eigenvalues 
(t(T') of all matrices A(:,I') with I' = X — {k}. If cr(I) < e < cr(2') for all i, an e-near 
circuit has been found. 



6. Practical Considerations and Numerical Examples 

When looking for circuits, one should first identify and remove those columns that 
cannot belong to any circuits. This is easily done using Lemma 12.51 and reduces N and 
m = rank(A) by the same fixed amount. Next, one should use a version of the random 
search algorithm 13.1} starting with small circuit sizes. A repeated random search will 
reveal whether there is more than one circuit present. Starting with algorithm 14.21 to find 
circuits is also possible but not recommended, since it may lead to very long run times for 
reasons that will be explained below. 

The main computational step in algorithm 13.11 is the determination of Z, a matrix 
whose columns span the null space of A, in step 2. This has to be done repeatedly until 
rank(Z) = 1. One would expects that rank(Z) = 1 already when the first random subset 
1C of size m + 1 is selected in step 1 of that algorithm. However, if the rank m of A is 
small relative to the number of columns N (e.g. p = ~ 0.3 or smaller), the null space 
dimension rank(Z) is typically larger than 1 for the first selection of /C, and Z must be 
computed again for smaller subsets of columns. This is just the computation of a null 
space of a downdated submatrix, and can be thus be done rapidly, see |10j . 

A numerical experiment was carried out to test if the probability of detecting a circuit 
in a single trial depends only on the ratio of the matrix rank m to the number of columns 

and the size of the circuits that are present. Random matrices with N = 100 columns 
and m = pN rows, with p = ^ G {0.3, 0.5, 0.7, 0.9} were generated that had circuits with 
sizes given by vectors of integers C = (ci, . . . , Cfc). This was done by first generating an 
mx (N—k) matrix A 1 with independent entries drawn from a standard normal distribution, 
choosing k random sets of columns of size c\ — 1, . . . , — 1 of A' and forming k random 
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linear combinations from them, and appending these k vectors to the matrix A' to form 
A. The probability of detecting a circuit in a single trial was estimated from 1000 trials 
applied to the same fixed matrix A. The results are given in the table below. The last 
column of the table contains the approximate probability computed from (J3|). 



p 


C 


N = 100 


N = 200 


N = 400 


N = 800 


Theory 


.9 


(6) 


.55 


.56 


.52 


.55 


.53 


.7 


(5,5,5) 


.42 


.44 


.43 


.42 


.42 


.5 


(4,4) 


.15 


.11 


.11 


.13 


.12 


.3 


(3,3,3,3,3) 


.13 


.14 


.14 


.10 


.13 



The table shows nearly constant detection probabilities, independent of the number of 
columns N, close to the approximate values in the last column. While the actual detection 
probability for a single trial of course does depend on the matrix, the experiment confirms 
the results of the discussion in section 3: The detection probability for a single trial depends 
essentially only on p = S and on the number and sizes of circuits that are present. 

The systematic algorithm 14.21 examines a fixed set of ( r ) submatrices of A for the 
presence of circuits. Unless something is known about the likely location of a circuit, the 
subsets J'i, . . . , J r in step should be generated randomly. The number of submatrices to 
be examined is easy to determine at the start, but it may increase during the execution, 
since the algorithm is recursive. Thus the computational effort may increase substantially 
beyond the initial estimate. The algorithm can also be used to search for a circuit. 

In an experiment, a m x N random matrix with TV" = 100 columns and m = pN 
orthonormal rows was prepared that contained a single circuit of size c = 5. Algorithms l3.il 
and !4.2l were used to find this circuit. The ordering of the columns was permuted randomly 
between attempts. The mean number of times that a nullspace had to be computed until 
the circuit was detected was estimated from 100 attempts for both algorithms. Also 
recorded is the expected number of trials from formula [3l The results are given in the 
table below. 





p = .9 


p = .7 


p = .5 


p=.3 


Algorithm 13.11 


1.61 


5.48 


28.7 


512 


Algorithm IQ 


2.18 


14.9 


66.8 


2270 


Expected trials 


1.69 


5.95 


32 


412 



The table shows that the observed average number of nullspace evaluations tracks the 
expected number of trials closely if random search is used (algorithm 13. ip and also for the 
case of systematic search (algorithm 14. 2p for p close to 1. For smaller p, the systematic 
search algorithm tends to require substantially more nullspace evaluations than the random 
method, no doubt because of the recursion. 

The systematic search algorithm 14.21 could also be used to find all circuits of a given 
size. It turns out that any given circuit will typically be detected many times, if this is 
attempted, leading to very long execution times. This occurs independently of self-calls of 
this algorithm. Hence one should only turn to algorithm 14.21 if the absence of such circuits 
is suspected with high confidence, after repeated random searches have turned up nothing. 

Unlike circuits of a given size, near circuits as defined in section 5 are not unique. Hence 
algorithm 15.21 does not behave as predictably as algorithm 13. 11 In particular, if e is chosen 
too large in this algorithm, then there may be many e-near circuits which can be detected, 
while there are none if e is too small. Using a bisection approach, it is possible to find 
e-near circuits with near minimal e fairly reliably. 
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Given a matrix A (generated at random) , it is observed that the smallest singular values 
of randomly selected submatrices with a fixed number of columns are very nearly normally 
distributed ~ N(fi, a) with [i and a depending on the matrix. Hence randomly selected 
submatrices will very rarely have smallest singular value less than \i — 4<r or so. In this 
situation, Algorithm [5?2] is capable of detecting submatrices for which the smallest singular 
value is less than \i — 8a, with high reliability. 



There are a number of other problems which can be addressed with modifications of the 
methods in this note. We only discuss problems that may be posed for general matrices; 
that is, problems related to circuits in specific matrices coming from graph theory, electrical 
engineering, or statistical applications will not be discussed. 

A simple variation on the task of finding one circuit of a given size is to find all circuits 
up to a given size. Clearly this may be attacked by repeated application of algorithm 13. 11 
Since circuits with fewer columns are more likely to be detected with this method, it may 
be necessary to delete a column in a circuit that has already been found from the matrix 
in order to find specifically longer circuits that do not contain this column. 

A common problem is to find circuits or near circuits that have prescribed intersection 
properties with a given collection of sets of columns. This occurs e.g. in statistics, where 
a portion of the variables may be thought of as predictors and another one as responses. 
Circuits that contain one response and a small number predictors are of special interest 
in problems of model selection. Algorithms 13.11 and 14.21 are easily modified to handle such 
situations. 

One may be interested in sampling randomly from the set of all e-near circuits up to 
a given size, for a given e. Algorithm 14.21 effectively provides such a random sampling 
scheme. However, it is unclear what the sampling distribution is in this case, and it 
appears to be difficult and laborious to estimate its properties. 



circuits of A\ has been identified. Is it possible to exploit this information to speed up 
the detection of circuits of the full matrix A? An extreme version of this task, related to 
subspace tracking, occurs if rows are added to A one at a time and a set of circuits has 
to be maintained or modified. Progress on such incremental algorithms has been made in 
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